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The Walsh transform is used extensively as a tool in determining whether a fitness 
function over a binary string is deceptive or not. This thesis shows that the Walsh 
transform method for detecting deception is easily generalized to functions over non- 
binary strings such as ternary strings, strings with real parameters, and strings with 
some binary and ternary characters and some real parameters. A generalization of 
the Hadamard transform is then used to organize the generalized Walsh coefficients 
into conditions for static deception for non-binary alphabets. The variances of fitness 
of schemata are calculated using generalized Walsh coefficients. Mathematica code 


for performing most of the calculations mentioned is included. 
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Chapter 1 


Introduction 


1.1 Preview 


Walsh functions, introduced by the mathematician J. L. Walsh in 1923, recently 
have become popular tools in fields such as telecommunications engineering, radar 
systems, image recognition and processing, speech processing, coding systems, and 
spectroscopy (Beauchamp, 1975). Bethke (1981) introduced the Walsh-schema trans- 
form, a method of using Walsh functions for computing schema averages. Since then, 
the Walsh-schema transform has become a cornerstone in the theory of genetic algo- 
rithms (GAs). 

Most of the theory of genetic algorithms applies to GAs over binary strings, real 


numbers, or permutations. Attempts to generalize this theory borrowed some notions 


from set theory (Radcliffe, 1991; Vose & Liepins, 1991). Recently, Mason (1991) ex- 
tended the concept of partition coefficients (Bethke, 1981) from the theory of binary- 
coded genetic algorithms to GAs over non-binary strings. 

This thesis generalizes Walsh functions to non-binary strings and reworks some 
of the existing theory of GAs to incorporate these new functions. Topics that will 
be covered and extended include Bethke’s Walsh schema transform, real-coded GAs 
(Bledsoe, 1961; Goldberg, 1990a; Wright, 1991), detecting static deception and 
Hadamard transforms (Homaifar & Qi, 1990; Homaifar, Qi, & Fost, 1991), and the 
variance of fitness (Goldberg, Deb, & Clark, 1991; Goldberg & Rudnick, 1991; Rud- 


nick & Goldberg, 1991). 


1.2. Notation 


Throughout this thesis, n will refer to the length of the strings in the domain of the 
fitness function. /& will refer to the cardinality of the alphabet, or if each character 
of a string is taken from a different alphabet, 4, refers to the alphabet for the first 
character, ko for the second, and so on. A k-ary alphabet is an alphabet with k 
characters. The characters in a k-ary alphabet will be represented by the integers 0 
through k — 1. A k-ary string is a string all of whose characters are taken from the 
same k-ary alphabet. A string that belongs to a k-ary alphabet is a string whose first 


character belongs to a k,-ary alphabet, whose second character belongs to a ko-ary 


alphabet, etc; such strings will be referred to as k-ary strings. A string will be repre- 
sented by a vector whose components are the integer representations of the characters 
in the string. For example, the hexidecimal (16-ary) string “3F2” would be denoted 
by (3,15,2). k-ary Walsh functions are the generalization of Walsh functions to k-ary 
alphabets, and k-ary Walsh functions are the generalization to strings that belong to 
k-ary alphabets. Both of these generalizations of Walsh functions, as well as gener- 
alizations of the Walsh functions to strings with characters which are real numbers, 
will be referred to as generalized Walsh functions. The corresponding transforms will 


be referred to as k-ary and k-ary Walsh transforms. 


Chapter 2 


History 


The question of what kind of problem is difficult for a GA started with Bledsoe (1961). 
Bledsoe described a situation he called lethal dependence in which a mutation in each 
of two genes would be an improvement, but a mutation in either gene alone would 
lead to death. 

Following Holland’s schema theorem (Holland, 1975), Bethke (1981) introduced 
the Walsh-schema transform for computing schema averages and gave some intu- 
itive conditions for problem difficulty that depended on the smoothness of the fitness 
function and the asymptotic behavior of the Walsh coefficients. 

The smoothness arguments were also used by Weinberger, who started with Eigen’s 
model for natural selection (Weinberger, 1987) and studied correlation lengths on the 


“landscape” of the fitness function (Weinberger, 1988; Weinberger, 1990). Kauffman 


in (Kauffman, 1989; Kauffman, 1990; Kauffman & Levin, 1987) created and analyzed 
a model fitness landscape with tunable ruggedness, which was analyzed further in 
Manderick, de Weger, & Spiessens (1991). Lipsitch (1991) used cellular automata 
rules to generate fitness landscapes and performed simulations to discover which 
classes of cellular automaton created the hardest landscapes. These analyses, how- 
ever, dealt with hill-climbing on fitness functions and neglected the effects of crossover 
and the usefulness of schemata. Also, Goldberg (1990b) pointed out that the asymp- 
totic behavior of the Walsh coefficients (and therefore the smoothness of the function) 
is not enough to insure the growth of important schemata. 

Holland’s schema theorem came back into play when Goldberg (1987) defined de- 
ception and introduced the minimal deceptive problem. The conditions for full static 
deception, a situation in which all low-order schemata are misleading, was introduced 
in Goldberg (1989b). The analysis of full static deception taking into account the 
schema disruption due to crossover and mutation was done in Goldberg (1989c) using 
operator-adjusted Walsh coefficients. 

This line of reasoning gave rise to a host of techniques: a method for analyzing a 
GA population in which schemata are not distributed uniformly (Bridges, Goldberg, 
1989), a method that uses Hadamard transforms to organize the deceptive conditions 
(Homaifar & Qi, 1990; Homaifar, Qi, & Fost, 1991), methods for constructing fully 


deceptive and intermediate deceptive functions (Deb & Goldberg, 1991; Goldberg, 


1990b; Liepins & Vose, 1991; Whitley, 1991a), and a simpler set of criteria sufficient 
to insure deception (Deb & Goldberg, 1992). This theory of deception has been used 
to explain some experimental results, such as why a problem that is easy for a GA 
might be difficult for a hill-climber (Wilson, 1991), and why certain GA test suites 
were solved so easily (Das & Whitley, 1991; Davis, 1991). It also lead to some attacks 
(Forrest & Mitchell, 1991; Grefenstette, 1991; Mitchell & Forrest, 1991; Mitchell, 
Forrest, & Holland, 1991; Tanese, 1989) that the lack of deception, where deception 
is defined by misleading schemata averages, is not enough by itself to insure GA 
convergence to the global optimum. 

Another mode of GA failure, apart from deception, has to do with the variance of 
schema fitnesses and sampling error (Davidor, 1991; Liepins & Vose, 1990b; Schaffer, 
Eschelman & Offut, 1991). There were a number of studies that introduced techniques 
for calculating schema fitness variances and signal-to-noise ratios and explain how 
to size a GA population accordingly (Goldberg, Deb, & Clark, 1991; Goldberg & 
Rudnick, 1991; Rudnick, 1991; Rudnick & Goldberg 1991). Connected with the issue 
of variance is multimodality, or having many high peaks that may confuse the GA 
(Goldberg, Deb & Horn, 1992). Goldberg, Deb, & Clark (1991) give an overview and 
a summary of conditions for GA success. 

Generalizations of this basic theory of convergence include applications to per- 


mutation problems (IKkargupta, Deb, & Goldberg, 1992; Sikora, 1991) and fitness 


functions with real parameters (Goldberg, 1990a; Wright, 1991). On a more abstract 
level are generalizations of the schema notion to arbitrary linear combinations of bits 
(Liepins & Vose, 1990a), and arbitrary predicates (Vose & Liepins, 1991; Radcliffe, 
1991). Mason (1991) generalized the notions of partition coefficients and static de- 
ception to finite non-binary alphabets. 

The other approaches to the question of GA convergence are few and sparse. 
Hart and Belew (1991) points out that the general problem that the GA tries to 
solve is NP-hard. Kauffman (1990) uses some concepts from information theory and 
the physics of phase transitions to show a connection between information redun- 
dancy and evolvability: minimal systems are not evolvable due to lethally dependent 
parameters. 

The work done on binary-coded GAs provides an extensive theoretical framework 
that hinges on the Walsh-schema transform. This thesis will focus on the theory of 
static deception generalized to non-binary alphabets and begins by generalizing the 


Walsh transform. 


Chapter 3 


Generalized Walsh Functions and 


Transforms 


3.1 Functions Over k-ary Strings 


This chapter examines functions over strings of length n whose characters are taken 
from a k-ary alphabet. For example, we can use this method to analyze a function over 
a ternary alphabet string. The basic idea is to treat each character in the string as a 
separate dimension, then take the n-dimensional Fourier series. The exact similarity 
between the generalized Walsh transform and Fourier series will be discussed later. 
The point is that it is useful to think of the Walsh functions as functions of n variables 


rather than functions of a single variable. 


3.1.1 k-ary Walsh Functions 


Definition 1 Define the k-ary Walsh functions as 


WO) (a) = ee FFT, 3.1 
P@=sz (3.1 
where the vector J = (ji, J2,--+;jn) ts the k-ary representation of j, and the vector 
F = (%1,29,...,%n) is the k-ary representation of x. 


Theorem 1 The k-ary Walsh functions satisfy the following normalization condition: 


(7) VO(#) = Oia On tae Sindy (3.2) 


where barred quantities refers to the complex conjugate and 6 is the Kronecker delta. 


The sum is over all distinct values of %. 








Proof 
®) is y bet kel hon %t 
SW Mw @ = GY Veen Rake Hone 
ge v1—0 72-0 
eee 2s 
= GY tS ate) 
emer, kee 0 
= 6511, Ojal2 +++ Ojala (3.3) 


When & = 2, the k-ary Walsh functions become the usual Walsh functions, up to a 
normalization constant. Throughout this thesis, the normalization is chosen so that 
the inner product of generalized Walsh functions is a Kronecker delta function. This 


differs from the usual normalization convention for Walsh functions, but it has an 


intuitive appeal when deriving theorems and proofs; these functions are orthonormal, 
not merely orthogonal. 

Example Take strings of length 2, n = 2, using a ternary alphabet, k = 3. Eval- 
uating the fourth function, 7 = (1,1), at the second position, 7 = (0,2) gives the 
following: 


Wey ((0,2)) = Gage POMC = —e — 





a. (3.4) 


g 
ho 
[ep 


Another way of looking at the Walsh functions wz) is to think of them as (up 
to normalization) e’® where the phase ¢ is the inner product between the index of the 


function 7 and the position 7 and the distance metric for the inner product is such 


that the phase increases by 27 as we traverse a dimension. 


3.1.2. k-ary Walsh Transform 


Now that the k-ary Walsh functions are defined, the k-ary Walsh transform can be 
stated. To put it simply: a k-ary Walsh coefficient is the inner product of the fitness 
function with a k-ary Walsh function, and the k-ary Walsh transform gives the k-ary 


Walsh coefficients in terms of the fitness values. 
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Definition 2 (k-ary Walsh Transform) The Walsh cocfficients w of a function f 
are given by 


k 


wy = WP (@s(). (3.5) 


Notice that we use the complex conjugate of the Walsh function. 
Example Use a ternary alphabet and length 2 string again. The Walsh coefficient 


wo) of a function f is given as follows: 


rjoay = Ze fl(00)) + se?! f((0,1)) + se F((0.2)) 
+56 F((1,0)) + 52" F119) + Set F((L,2) 


+56 f((2,0)) + ce 2"/9F((2,1)) + eI G((2,.2)). (3.6) 


Definition 3 (Inverse k-ary Walsh Transform) The inverse transform is given 
by 


f(a) = UP u;. (3.7) 


y 
where the sum is over all possible values of the k-ary string J. 
Notice that the Walsh function is not conjugated in the inverse transform. In practice, 
the generalized Walsh transform and its inverse would be computed by the generalized 
Fast Walsh Transform algorithm and the inverse generalized Fast Walsh Transform 


listed in Appendix A. 
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Theorem 2 The inverse k-ary Walsh transform of a k-ary Walsh transform is the 


identity transformation. 


Proof Notice from the definition of the k-ary Walsh function (3.1) that wh(7) = = 


wh) (z), 


r 


LP OLE OG 
= Ew Da (IG 
Ena Lw!a me (7); 


6 


wy 


ae 
= 
‘S S 


II 
cS 


Orsay; boys Os TrYnd 


#2) (3.8) 


The method of using these functions to determine deception will be discussed in a 
later chapter. The procedure is a straightforward extension of the binary case. First, 
the orthogonality of the k-ary Walsh functions is used to find k-ary Walsh coefficients 
for the fitness function to be analyzed. These coefficients are used to compute the 


schema averages, and the schema averages are used to say something about deception. 


3.1.3. Some Theorems and Proofs 


The following are a few theorems about the k-ary Walsh transforms. Most of them 
are simple and add to our intuition of how they work. 


ie 


(0,0) | (0,1) | (0,2) | (1,0) | (1.1) | G2) | (2,0) | 2) | (2.2) 





Table 3.1: Values of f, f;, and fo for various values of x. 


(0,0) | (0,1) (0,2) C150) Ben ial Te Gi (20) C2 Be 





= 5/2 i 5/2 
w, |) 15 0 0 et 0 g |e | 0 0 
9121/2, a. al/2 
wi] 3 | See | ae 0 0 0 0 0 0 
—21.91/2 — 2191/2 aie 5/2 ae 5/2 
a 18 fos t us t tos 2 0 0 ore 2 0 0 








Table 3.2: Generalized Walsh coefficients of the functions in Table 3.1. 
Theorem 3 A function is additively separable if f(@) = fi(a1)+ folva)+.... A fune- 
tion is additively separable if and only if its k-ary Walsh transform 1s also additively 


separable. 


Proof The proof of this comes immediately from the fact that the transform is 
linear. 

Example Consider again the ternary alphabet and let f((21,72)) = 377 +x. Let 
filvi, x22) = 3x7 and fo(x1,22) = vo so that f(x1,22) = filti, 22) + fo(x1, 22). See 


Tables 3.1 and 3.2 for the function values and generalized Walsh coefficients. 


13 


Note that w, + wo = w. Although this theorem is straightforward, it stresses the 
idea that it is useful to think of the characters in the string as being independent 
variables. Also, functions which are partially additively separable are often used 
in testing genetic algorithms. Partially additively separable functions also arise in 
real applications. For example, in designing a high-performance engine using genetic 
algorithms, the first three characters in the string might code for the type of steel 
used, while the next two characters might code for the fuel mixture. Intuitively, these 
two characteristics are mostly independent; the optimal fuel mixture does not depend 
strongly upon the type of steel used, and the best kind of steel to use does not depend 
strongly upon the fuel mixture. We can express the ideas above mathematically as 
follows: 


f((21, 22,43, 04,05)) = fieg(@1, 2,03) + fas(t4, 5) + O(€) (3.9) 


where ¢€ is small compared to one, and all the functions f, fj23, and f45, are of order 


one. In this case, 


Wa aaeiay U(Ji,J2,I3) + v(ja,J5) i Ole). (2210) 


To put it more simply: if the fitness function is well-approximated by a partially ad- 
ditively separable function, then the k-ary Walsh transform is also well-approximated 


by a partially additively separable function. 
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Theorem 4 The average of the function keeping the first character fired at 0 and 
varying the other characters is given by the following sum: 


1 
Fa f8 (M0.0.0.0.) eto i oa) eee A W(k-1,0,0,...)): (11) 


Note that wo is the average of the function (times a normalization constant). 

Notice that the summation over all the variables of the function except the first 
character has been reduced to a single summation in transform space. This is why 
the k-ary Walsh transform is useful. The average of a function of strings of length 
n over a schema with m fixed positions can be expressed as a sum over kh” k-ary 
Walsh coefficients. This idea is used later in the chapter on using generalized Walsh 
functions to calculate schema averages. 

Example Let us take f((v1,2)) = 327 + 22 and use a ternary alphabet as in a 
previous example above. Then the average of the function setting 7, = 0 and letting 


Yq vary 1s 

it 1 

31 f((0,0)) + FO, 1)) + F((0,2))F = F(e0.0) + We.) + We). (3.12) 
Had f been a function of strings of length 3 instead of 2, the left hand side of the above 


equation would have 9 terms, while the right hand side would still have 3 terms. 


Theorem 5 The average of the square of the absolute value of the function is equal 
to the average of the square of the absolute value of the k-ary Walsh transform of the 
function. 
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This is analogous to the same relation in Fourier transforms. The usefulness of taking 


the average of |f|? will become apparent in the chapter on the variance of fitness. 


Proof 


g 
Sk) pw a (RB) yo 
= PVE DW (Hapug 
PG ¥ 
=k) => kyo 
= VY Hw (amg: 
PG # 
= ye Ontanl cies pet pla 
PF 
= Dd Wpwz. (3.13) 
Pp 


Example Let f((21,72)) = 3x] +72 as before. The sum of squares of f over the 
entire x space is 564, and so is the sum of the squares of the absolute values of the 
generalized Walsh coefficients. 

Instead of summing the squares of f over the entire x space, let us sum it over 


just the strings with x7, = 0: 


~ FOL) LWP Buel UP Dug: 


£2 ,L3 yen £2403 yo Tn 


PG T2,%3 En 


(kh) poy a (k) yp 
= YY GO Hapwz: 


PG T2:E3 En 


= bs 8 n0¢2Opsqs «++ Onda Waa 
PT 
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2 = W pW (a1 p02 sP3 Pa ye-esPn)* (3.14) 


P1,01 P2,P35---Pn 
Notice that the sum on the left has k”~! terms, while the sum on the right has k”*! 


terms. So working in the transform space makes for more work in this case. 


3.2 Functions Over k-ary Strings 


This section considers functions over k-ary strings, strings whose characters are taken 
from different alphabets. The ideas from the previous sections in this chapter are still 


valid, and the definitions need to be modified only slightly. 


3.2.1 k-ary Walsh Functions 


Recall that k, is the size of the alphabet for the p-th character in a string. Define the 
k-ary Walsh functions in the following manner: 


we) = 


iiaa,: 3.15 
aor ewe im) ooo 


Again, the normalization condition is given by the following: 
=k), oy 7 (Bs 
yO (awa) = 6p, 41 O peg «++ Opndn: (3.16) 


The proof of the normalization works exactly the same way as it did before. The 


form of the transform (3.5) and inverse transform (3.7) also remain the same. 
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3.3. Functions Over Reals 


As a computer internally represents real numbers as integers, one might suspect that 
there is some similarity between analyzing genctic algorithms whose fitness functions 
are defined over strings whose characters are taken over large alphabets and analyzing 
GAs with fitness functions over reals. This suspicion is correct, and with this in mind, 
real variables will be referred to as characters taken from oo-ary alphabets, strings 
that have one real variable followed by one ternary character will be referred to as 
(co, 3)-ary strings, and so on. Despite the similarities, the analysis of functions with 
real variables is not exactly the same as the analysis of functions with characters from 
large alphabets; therefore the oo symbol should be taken to signify a real variable 
rather than a character from an infinitely large alphabet. 

Previously, it was demonstrated that the generalized Walsh transform was equiv- 
alent to taking the Fourier transform along each dimension. All one needs to do 
is apply this same idea. If the fitness function is a function of two real variables 
and three discrete variables, then simply take the Fourier transform along all five 
dimensions. 

To avoid normalization constants, this thesis assumes that the real variables of 
the fitness function are always in the unit interval [0,1]. One can often rescale the real 
variables in the fitness function so that this is true. Later, the analysis of functions 


whose variables are in the range [0, co] and [—oo, o«] will be discussed. 
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Consider a function over (00, 00, 3,3,3)-ary strings. That is, a function that takes 
two real variables and three ternary characters. The generalized Walsh coefficients of 
this function are given by: 


2 2 


1 1 7 cael cia ; | ; | ; 
W ja jo Ja.ja.ds) ane av, [ dx, Ds Ss os f(®He EBERT aT Ea Sora eles 15/3). 


(3.17) 











J2=—& j3=0 j4=0 js=0 


=) _ 9-3/2 — = : : eee te ua Qa jite2jot23j3/3+2r4j4/3+25 Js /3) 
f(%)=3 2 > a a se W jr j2.53.J4.55) © ; 
war oo 








(3.18) 
Notice that the sum over 7; and jo run from —oo to oo, and that there is no normal- 
ization factor associated with x, and 22 because they range from 0 to 1. 

In practice, the sum from —oco to oo would be approximated by a sum from —A 
to A, where A is some large integer constant. This will yield an arbitrarily accurate 
approximation whenever the infinite sum converges. Although the necessary and 
sufficient conditions for convergence are beyond the scope of this thesis and is a topic 
for future research, piecewise smooth functions can always be approximated by this 


method (Tolstov, 1962), and will be discussed in the next section of this chapter. 
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3.4 Reduction to Smaller Alphabets 


One of the most useful properties about using these generalized Walsh transforms is 
that functions over reals or large alphabets can sometimes be well-approximated by 
functions over small alphabets. Consider a function over a single real variable. The 
generalized Walsh transform is equivalent to the Fourier transform in this case. It 
is well-known that if the first m derivatives of the function are continuous, and the 
m + 1-th derivative is discontinuous, then for large 7, the magnitude of the Fourier 
coefficients fall as j~™~? (Lighthill, 1959). Thus, if the function is smooth enough, 
then one can get good approximations to schema averages by dropping the generalized 
Walsh coefficients with high spatial frequencies. Goldberg (1990a) theorizes that a 
high-cardinality GA performs a reduction to smaller alphabets; this section gives an 
explanation of when this can occur, what it means in terms of generalized Walsh 
coefficients, and how to take advantage of it when analyzing a fitness function. 


Example Consider the following test function: 


Definition 4 (Test Function 1) 


ebay 
fla)= (3.19) 


0 otherwise 


This function has no continuous derivatives, and is discontinuous itself. Thus, the 


1 


generalized Walsh coefficients w; fall as j~°. See Figure 3.1 for comparisons of the 
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Test Function 1 
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Figure 3.1: Test Function 1. Jump discontinuity slows convergence. 


function with the approximations. Notice that since f is real-valued, the general- 


ized Walsh coefficients w; and w 


F _; are complex conjugates; and to get a real-valued 


approximation, if we keep w,; in our sum, we must also keep w_j;. 


Example Consider the function: 


Definition 5 (Test Function 2) 


See “G ashe (3.20) 
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Test Function 2 











£ 3 term approximation £ 5 term approximation 
0.0006} /” 0.0006} /~ 
0.0004 0.0004 
0.0002 0.0002 
-0.0002 -0.0002 
-0.0004 -0.0004 
-0.0006 Ne a -0.0006 














£ 7 term approximation £ 9 term approximation 
0.0006 0.0006 
0.0004 0.0004 
0.0002 0.0002 
-0.0002 -0.0002 
-0.0004 -0.0004 





-0.0006 -0.0006 








Figure 3.2: Test Function 2. Smoother functions converge faster. 


Since 0? f/Ox? equals 1/4 at x = 0 and —1/4 at x = 1, this function has a discon- 
tinuous second derivative. Therefore, the generalized Walsh coefficients fall as 77°. 
See Figure 3.2 for comparisons of this function with the approximations. Notice how 
much faster the approximations converge to this function than the previous one. If 
a k-term generalized Walsh approximation is satisfactory, then the function can be 


treated as if it were a function over a k-ary alphabet. Not only does this make taking 


schema averaging simpler, but it also makes determining deception much easier. 
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Example Consider a function f((%1,72)) where 2, is a real variable in [0,1] and 279 is 
a character from a ternary alphabet. Let wy, j.) be the generalized Walsh transform 
jaja 


of f. Let wy ) be the (5,2)-ary approximation of w. Then we have the following 


relations between w’ and w: 





Woon) = 5! wee): 
(ay = 5 Pways 
(jn) = 51? wW.2,30)3 
a2 Pano 5); 
Wainy = 51? -1,50)- (3.21) 





The reason this works is that k-ary generalized Walsh functions are periodic with 
period k, and therefore the Walsh coefficients with indices —m correspond to Walsh 
coefficients with indices k — m. The mapping used above keeps the 5 x 2 Walsh 
coefficients with the lowest spatial frequency and fixes the normalization. 

Similarly, to reduce a function over (100,2)-ary strings to a function over a (7,2)- 


ary strings, one would use the following mapping: 








/ as 

Woe) = TQ Qi72 Osa): 
/ e 

WL jo = 100 72 C.i2 ; 
/ ass 

Way. 100 72 (2,52 ; 
/ (ig 

Wage) = Toqr/2VG.i2): 








71/2 


Wage) = Topi Evora 
71/2 
W652) = T9pi7eWlos.ia)} 
71/2 
WEG) = Tota Woon): (3.22) 


Again, one simply takes the 7 x 2 Walsh coefficients with the lowest spatial frequency 
and fixes the normalization. 
Let us do one final example. Consider again the Test Function 2 (3.20) we used 


earlier. The Walsh coefficients are given by the following: 


wo = 0: 

wy = —0.000265108:: 
w_, = 0.0002651082; 

wo = —0.0001248612; 
wp = 0.0001248612: 

ws = —0.0000175818:: 
wz = 0.00001758182: 


(3.23) 


The 7-ary approximation to Test Function 2 (3.20) would have Walsh coefficients w’, 
which are the following: 
wy = TP? wo = 0; 
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w, = 7/?w, = —0.000701409:; 
wy, = 7? w. = —0.0003303512: 


/2w. = 0.00004651712: 


& 
ore 

II 

~] 


w, = 7/?w_3 = —0.00004651712; 


we = 7'Pw_» = 0.0003303517; 





we = 7?w_1 = 0.00070149%. (3.24) 


Since a genetic algorithms population is always finite in practice, it is impossible to 
have individuals distributed over the entire real line from —oo to oo uniformly. In the 
initial population, one must distribute the individuals according to some probability 
distribution with a finite integral. For every probability distribution P(x) with a finite 
integral, one can take a random variable uniformly distributed between 0 and 1 and 
transform it into P(2) with an appropriate change of variable: 2 = g(r). For instance, 
if r is a random variable uniformly distributed between 0 and 1, then x = + —lisa 
random variable distributed between 0 and o, and y = tan(z(r — 1/2)) is a random 
variable distributed between —oo and ov. 

Example Consider an initial population of oo-ary strings x of length 1 distributed 
with the normalized probability distribution P(2). Let the fitness function be f(2). 


Rather than considering f(z), it is easier to consider f(g(2’)), where the function g 
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is defined such that x’ is distributed evenly between 0 and 1: 


af Playa) = 


a Pde: Soe): (3.25) 


Here, g~'(x) refers to the inverse of the function g. Under the change of variable 
x’ = g'(x), the initial population of strings x’ are uniformly distributed in the 
interval [0,1]. 

This method of transforming an unbounded real variable into a real variable uni- 
formly distributed in [0,1] allows us to analyze GAs over unbounded real variables 
without introducing arbitrary cutoffs or generalizing nonuniform Walsh schema trans- 


forms (Bridges & Goldberg, 1989). 


3.5 Generalizing the Fast Walsh Transform 


The Fast Walsh Transform (Goldberg, 1989b) can be generalized in a similar man- 
ner to the Walsh transform. In the ordinary Fast Walsh Transform, one places the 
function values f(a) on a binary tree; the position of f(a) on the tree corresponds to 
the representation of 2. For example, f((0,1,0,0)) would be found by starting at the 
root, taking the left branch, then the right, then a left, and finally another left. 

One descends down and process the tree level by level. At each level, one applies 


the algorithm described below to each node in that level before descending down to 
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the next level. The algorithm is the following: take the left subtree / of the node, add 
it to the right subtree of the node, and call the result J’; take the left subtree / of the 
node, subtract the right subtree r and call it 7’. Now replace the left subtree with I’ 
and the right subtree with r’. To put it more briefly: (J,r) — (1+7r,1—1) where the 
tree-wise addition and subtraction means to add like components of / and r. 

To generalize this to functions over k-ary strings, form a k-ary tree and descend 
level by level. Instead of applying the rule (/,r) — (/+r,l—r), we apply the following 


rule: 


€1,€2,---5Ck) mais 


a: (3.26) 


The basic idea is the same as with the ordinary Fast Walsh Transform, but instead 
of simply adding and subtracting the children at each node, one performs a Fourier 
transform upon the children. That is, c = FT [c]. Note that if we have a large 
alphabet, it is worthwhile to perform this Fourier transform using a Fast Fourier 
transform. 

Note that if we have a k-ary alphabet, then we can still perform the Fast Walsh 


Transform. The root node of our tree would have k; children; all the nodes at the 
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next lower level would have ky children, etc. We still go down the tree level by level 
and apply the Fourier transform upon the children of each node. 
Example Figure 3.3 shows a generalized Fast Walsh Transform for a function over 
(2,3)-ary strings. 

Mathematica 2.0 programs that compute k-ary and k-ary Fast Walsh Transforms 
and their inverses are given in Appendix A. 

This chapter has shown that the Walsh functions and transforms can be gener- 
alized to non-binary strings. The next chapter shows that the generalized Walsh 


functions and transforms can be used to compute schema averages. 
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Stage 2: 


(a +d) +(b+e) + (c+ f) 





Figure 3.3: Generalized Fast Walsh Transform for (2,3)-ary strings. 
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Chapter 4 


Using Generalized Walsh 
Coefficients to Determine Schema 


Averages 


Determining schema averages is at the heart for the reason of using these transform 
methods. In Chapter 3, this thesis stated that a sum of f over m characters of a 
string of length n turned into a sum of generalized Walsh coefficients over n — m 
characters. For this reason, the average of a function over a schema with m positions 


fixed turns into a sum over m characters of the Walsh coefficients. 
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Example 


Consider strings of length 3 taken from a (2,4,3)-ary alphabet. 


examples of schema averages are as follows: 


F((#, * *)) 
F((0, *, *)) 
F((, *, *)) 
F((*, 0, *)) 
F((*, 1, *)) 
F((*, 2, *)) 
F((*,3, *)) 
F((#, *, 0)) 
F((*,*,1)) 
F((0,0, *)) 


f((0, 1,1) 





1 

4 (0.0.0); 

1 

760.00 + W(1,0,0 ); 

1 

76 w0.0.0 — W1,0,0)); 

1 

76.0.0 + wo,0,1) + %(1,0,0) + WA.0,1)); 
1 

760.00 + 21W(0,0,1) — W(1,0,0) — 2W(1,0,1))} 
1 

760.00 — W(0,0,1) F WC,0,0) — w(1,0,1))3 
1 

60.0.0) — #00.0,1) — H.0,0) + #&1,0,1)): 
1 

70.0.0 + U0,0,1) + w0,0,2)); 

1 

760.00 + ean a1) + e'™ Bang. o.9)); 
1 

7600.00 + weo,1,0) + Waz,0) + W0,3,0) 





+10(1,0,0) + WC,1,0) + W120) + Wa,30)); 


1 
e273 


a Ani/3 
7 ((0,0.0) + 


w(0,0,1) + € W(0,0,2) 


Ana/3 


+2wW(0,1,0) + 267? F001.) + 2677" W912) 


er Bay ettt/3 


— W(0,2,0) — (0,2,1) — W(0,2,2) 
2721/3 Ara/3 
— 2W(0,3,0) — 2€ : M031) = Ve : W(0,3,2) 


+w(1,0,0) + Banc a4) + et Ban 9.9) 
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Some 


Haw o) +16 Fw ay + 2647"F 2G 41.) 
—wer9) — Oo Bwe aay — "Bw 22) 
—?W(4 3,0) = 267? "3.04 3.1) = 2643.4 3.9)). (4.1) 


Note that the difference between the sums for f((0,*,*)) and f((1,*,*)) is that the 
Walsh coefficients are subtracted rather than added. In general, when we have a fixed 
character p in the j-th position in the schema, the corresponding sum in the transform 
has a phase of e27?/'s., 


It is straightforward to make a general theorem from these observations. 


Theorem 6 Consider a schema with m fixed characters p; at positions j;. Then the 


average of f over that schema is 


I] ais S- S- a S- elt ip / kj, t+lopo/kjg+--+lmpm [jm ) 
q 


hy Tes 
WO Dp OF0 6 Ok Oe 0 tds (4.2) 
Proof Recall that when we write f in terms of its generalized Walsh coefficients, the 
result has n sums, where n is the length of the strings. There is a sum for each of the 
n characters in the string of the argument. Now, when we take the average of f over 
a schema which has a * in the j-th position, only the terms that have no phase change 
as we traverse the j-th dimension survive; this means that only the generalized Walsh 


coefficients with a 0 in the j-th position survive in the final result. 
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We can also define schemata in the space of real numbers, and take schema av- 
erages using generalized Walsh coefficients as before. These schemata are analogous 
to the slices used in Goldberg (1990a) rather than the schemata for real parameters 
defined in Wright (1991), and are a natural extension of schemata in finite alphabets. 
Example Consider a function f over (00, 00,3,3,3)-ary strings. Let (0.67,*,*,*,1) 
refer to the set of strings whose first variable x; equals 0.67 and last variable x5 equals 


1. Then the average of the function f over that schema is 


oo 2 
{Ost areas" >  wamigje er ee. (4.3) 


Ji=—c js=0 
To get more familiar with using these generalized Walsh coefficients on real variables, 


let us do some more schema averages: 


Hace * KK, *)) = 33a. ,0,0,0,0); 
FC) %,2)) = 37-FP(2010,0,0,0,0) + 77/7 2010,0,0,0,1) + €17/720(0,0,00,2))! 


F((O, *, *, >, *)) = a0 oie Wj, ,0,0,0,0): (4.4) 


ji=—co 
This chapter has generalized the notion of schema used in analyzing genctic al- 
gorithms of binary strings and showed how to use generalized Walsh functions and 
transforms to compute schema averages. A method of comparing schema averages to 


determine deception is given in the next chapter. 
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Chapter 5 


Using Generalized Hadamard 


Transforms to Detect Deception 


5.1 Deception in Non-binary Strings 


Hadamard transforms provide a convenient way of checking deceptive conditions in a 
systematic manner (Homaifar, Qi, Fost, 1991; Homaifar, Qi, 1990). In this chapter, 
the Hadamard transform will be generalized to non-binary strings. 

Deception in functions over non-binary strings is qualitatively different than de- 
ception in functions over binary strings. In order to get full deception with binary 
strings, all schemas of order n — 1 or less must point towards the complement of the 


global optimum. With non-binary strings, all that is required is that all schemas 
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of order n — 1 or less point away from the global optimum. Mason uses this as his 
definition of deception (Mason, 1991). 

Example Consider a function f over a ternary string of length 3 whose global 
optimum is at (2,2,2). Then some of the conditions necessary for deception are as 


follows: 


F((2,4,*)) << f((0,*,*)) or F(A, *, *)): 

F((2,2,%)) << f((0,0,*)) or f((0,1,*)) or f((1,0,*)) or F((1, 1, *)); 
F((2,2,%)) << F((1,2,*)) or F((0, 2, *)); 

F((2,2,*)) << f((2,0,*)) or f((2,1, *)); 

F((1,2,%)) << f(,0,*)) or f((1, 1, *)). (5.1) 


Mutation for non-binary strings works differently than mutation for binary strings. 
Some mutation operators, such as creeping mutation, mutate characters to nearby 
characters. For example, using a 10-ary alphabet, the string (0,0,0) is much more 
likely to be perturbed to (1,0,0) than (9,0,0). Similarly, mutations on a real number 
might be implemented by adding noise with a gaussian distribution. If the fitness 
function is not too discontinuous, then these kinds of mutations can work as a sort 
of gradient descent to help convergence (Bledsoe, 1961). The analysis of how the 


distribution of perturbations affects convergence is beyond the scope of this thesis 
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and overlaps the theory of simulated annealing (INirkpatrick, Gelatt, & Vecci, 1983; 
Szu & Hartley, 1987). Assuming that the algorithm uses localized perturbations, it 
is reasonable to define a fully deceptive function as a function whose order n — 1 or 
less schemas point as far away from the global optima as possible. Now the deceptive 


conditions become more straightforward: 


F((0,*,*)) > F(C1,*,*)), 
F((O,*,*)) > F((2,*,*)), 
F((0,0,*)) > f(0,1,*)), 
F((0,0,*)) > f((0,2,*)), 
F((0,0,*)) > f((1,0,*)), 
F((0,0,*)) > f(A, 1,*)), 


f((0, 0, *)) 
F((0, 0, *)) 
F((0, 0, *)) 
F((0, 0, *)) 


and all permutations of the strings above. 
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(1,2, *)), 
F((2,0,*)), 
F((2,1,*)), 
F((2,2, *)), 


(5.2) 


5.2 Hadamard Transform 


Consider the competition partition in which the fixed positions are at j),jo...-.Jp: 


where p is the order of the partition. 


Definition 6 (Generalized Hadamard Transform) Define the generalized Hadamard 


transform matrix Has: 


H=h, @ho®...@h,, (on) 
where & is the tensor product: 
ae af be bf 
ab e f ag ah bg bh 
® = (5.4) 
GS 5G gh ce cf de df 
cg ch dg dh 
and h,, is defined as follows: 
1 il 1 1 
1 e2F*) kim el ttl kim . el kim —1)280/ jm 
—— ' (5.5) 
1 el Bim —1)2%0/ hm el Bim — Ari / hm 3 06 Bim — Bim —1)271/ Bin 


The generalized Hadamard transform will be used in the next section to compute the 


conditions for deception. 
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5.3 Detecting Deception 


Now we make the important assumption that the global optimum is at (k, — 1, ko — 
1,...,4n — 1) and all schemas of order n — 1 or less must point to (0,0,...,0). This 
definition of deception requires that the lower-order schemata lead as far away from 
the global optimum as possible. This differs from Mason’s definition of deception, 
which requires only that the lower-order schemata do not point to the global optimum. 
Using this new definition, we can now define the matrix AJ such that the deceptive 
conditions for that partition become MW > 0, where 

lst row of H-2nd row of H 

lst row of H-3rd row of H 


M= (5.6) 


Ist row of H-last row of H 


and W is the vector of generalized Walsh coefficients used in the competition partition. 
Example Let w be the Walsh coefficients for the fitness function f over (3,2,2)- 
ary strings. Consider the competition partition (F,«*,*), where F represents a fixed 


position and * represents a wildcard character. Then 


W = (w(0,0,0); W(1,0,0)> w2,0,0)) 5 
1 1 1 
H = 1 e2m/3 ¢Am/3 


1 eft/3 8r/3 
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0 1-— e2t/3 ip eit/3 
0 1— elt/3 ie e8tt/3 


The condition that AZW > 0 becomes the following: 
(1 _ e™ 3) we ay as (1 _ e319 0.0) S 0; 


(1 = 3) WE 0,0) + (1 = Ft!) 9.4.0) > 0. (5.8) 


When we substitute function values for the Walsh coefficients in the above expression, 
we get 

F((O,#,#)) > FCA, #)), 

F((O,*,#)) > F((2,*,*)), (5.9) 


which are indeed the deceptive conditions corresponding to that partition. 


Example Let us repeat the above calculations for the competition partition (F, D, F): 


W= (20(0,0,0) W(0,0,1)5 VU,0,0); 10,1) 5 (20,0); wW2,0,1)). 3 
1 1 1 1 1 1 
1 -l 1 —1l 1 —| 





0.0 DSA 2S A 1S PR 128 
BP NO, 2 a A Theta aN 

OO: TSB es SS AP A, 

021-B 14+A 1-A 14+B 


where A = e?7/3 and B = e-?"/3. MW > 0 gives 


2wyo,o,1) + 2W10,1) + 2We01) > 

(1 — A)weioo) + (1 — A)waoay + (1 — B)weoo) +1 - B)weos > 
2w(9,0,1) + (1 — A)weoo) + 1 + A)wao,n + (1 — B)weo,0)+ 

(1+ B)weoi) > 

(1 — B)waoo) + (1 — B)waoa) + 1 — A)weoo) + (1 - A)weoa) > 
2weoo1) + (1 — B)waoe) + (1 + Byway + 1 — A)weoa)+ 


(1+ A)weot) > 
Translated into function values, this gives the following set of inequalities: 


F((0,*,0)) > f((0,*,1)), 
F((0,*,0)) > fC, *,0)), 
F((0,*,0)) > f(,*,1)), 
F((0,+*,0)) > f((2,*,0)), 
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(5.10) 


0. (5.11) 


F((O,*,0)) > f((2,*,1)), (5.12) 


which are the deceptive conditions corresponding to the competition partition (F, D, F). 


For a function to be deceptive at order p requires that all order p schemas lead as 
far away to the global optimum as possible. Since there are (") ways of choosing p 
fixed positions among the n characters, there are that many competition partitions 
at that order. In the above example, the function is deceptive at order 1 if it is 
deceptive in the partitions (F,D,D), (D,F,D), and (D,D,F). For full deception, the 
function must be deceptive at all orders between 1 and n—1 inclusive. Mathematica 
routines for computing all of the above are included in Appendix A. 

This chapter has defined the deceptive conditions for functions over non-binary 
alphabets. As mentioned in Chapter 2, deception is only one of the many reasons why 
a genetic algorithm may fail. The next chapter focusses on another mode of failure: 


the variances of schema fitness. 
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Chapter 6 


Variance of Fitness 


Although the schema theorem predicts the expected growth rate of a schema in a 
population, the finite population size introduces a sampling error that can lead to large 
deviations from expectations and can affect GA convergence (Goldberg & Rudnick, 
1991: Goldberg, Deb, & Clark, 1991; Rudnick & Goldberg, 1991; Schaffer, Eschelman, 
& Offutt, 1991). 

The full theory of how this sampling error affects GA behavior is beyond the scope 
of this thesis. Only the methods for calculating the signal and noise in a competition 


partition (Rudnick & Goldberg, 1991) will be extended to non-binary alphabets. 


Definition 7 (Index Set) The index set G(h) of a schema h is the set of indices 
of the generalized Walsh coefficients used to express the function average over the 


schema h. 


Example In the space of strings of length 3 taken from a ternary alphabet, 


G((+,*,*)) = 1(0,0,0)}, 


G((0,*,*)) = G((1,*,*)) = G((2,*, *)) = {(0,0,0), 1, 0,0), (2,0,0)} = («,0,0), 


G((0, 0, *)) Goi) 0): (6.1) 


This notation helps to make the definitions of signal and noise clearer and more 


compact. 


Definition 8 (Collateral Noise) The collateral noise 0), for a schema h is defined 
by the following: 


2 
) 


cha va(s(h)) = GE |e - Es 





2 
; 





= Ela - [POs 
EC ea AC) (6.2) 


(f(x)),, indicates the average of f(a) where x ranges over schema h, (f?(x)), indicates 


the average of f?(x) over h, and |h| is the number of individuals represented by schema 


h. 


This definition differs from Rudnick and Goldberg (1991), which defines the collateral 


noise as 07. 


Definition 9 (Partition Signal) The signal S(J) of a competition partition J is 
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defined as 


S°(F) = (F()P) , — MF) 5? (6.3) 


(f(h)) , denotes the average of schema averages f(h) where h runs over J. (|f(h)|?), 
denotes the average of the absolute value squares of schema averages in the compcti- 
tion partition J. 


The first term in Equation 6.3 can be written as 


(OP), = 77 = te) (6.4) 


hed 


where |J| is the number of schemata in J. Substituting the expression for f(h) in 


terms of generalized Walsh coefficients gives 








2 
(OOP), = FLD 46! S| wi (65) 
hed |jeG(h) 
Expanding the quadratic yields 
1 7 — (kb 
(OP), = Tp LDA! wp DPV Lh), (6.6) 
m AT FTeG(h) 


where 7 61 = (4, —4, mod ky, Jo — lo mod ho, ..., Jn —1, mod k,,). Due to the orthog- 


onality of the generalized Walsh functions, this reduces to 


(IF()P) = Wis 1 3 Jw? (6.7) 


jeG(h 


The second term in Equation 6.3 reduces to the following product: 


>= TT k, feol?. (6.8) 


lf 


Thus, Equation 6.3 simplifies to: 


S?70J) = SF (les? — |wol?); 
JeG(S) 
= 2 fey’. (6.9) 
JeG( J) —{0} 


Example Consider (3,3,3)-ary strings and the competition partition J = (F, F,*). 


The square of the partition signal of J is 


SD) = |we,.o)l? + lwo2.0)/ 





2 





+|v1,0,0 7 + beaa.ol? + [e1,2.0) 


+|w(2,0.0)7 + [2,1.0)) + |2.20)]?. (6.10) 


Definition 10 (Partition RMS Noise) The root-mean-squared noise C(J) of a 
competition partition J is defined as the root-mean-square of the collateral noises 


for cach of the schema h in the partition. 


CF) = (oh) , (6.11) 


In Chapter 3, there were some examples (3.13,3.14) of computing the (f()°), term 
in the expression for o7 (6.2). These examples can be gencralized to give the following 
result. For the complete derivations for binary strings, see Goldberg & Rudnick (1991) 
and Rudnick & Goldberg (1991). 
o% = [kn 3 TH; wpe {h) (6.12) 
m (FDC, (h)—G?(h) 
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=> 


G?(h) = G(h) x Gh), G(h) = {7.9 : FO leG(h)}, and FET= (j, — be mod hy, jo — 


ly mod ko,..., Jn — I,» mod k,,). As before, the off-diagonal terms vanish, leaving 


CD = X ley? (6.13) 


where G(J) is the complement of the set G(J). 
Example Consider again the above example using (3,3,3)-ary strings and the com- 


petition partition J = (F, F,*). The square of the partition RMS noise of J is 


C777) = |wooal? + wool’ + worl? + wos! 
2 2 2 
+|wo2ol + lwo22/° + leash + leva.o2) 


+lwoaal + eas? + aan? + was.) 











+|2(2,.0, "+ W(2.0,2 a4 wernt + |W (21,2) 

















+|we2nl + |we22))’. (6.14) 


For large alphabets and fairly smooth fitness functions, we can use the alphabet- 
size reduction described earlier to get a good approximation to the variances with 


relatively little effort. For instance, the variance of Test Function 2 is 


a (#2(1 — 2)2(1/2 — 2)3)2dr 


= ¢h (x?(1—2)°(1/2- nar) = eA, (6.15) 
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We can approximate the same calculation using the 7-ary reduction. 
1 
(ls? + |e]? + [avg]? + feos]? + levs|? + Jeg|?) = 1.723 x 107%. (6.16) 


This chapter has shown that the signal and noise calculations for functions over 
binary strings can be generalized to non-binary strings as well. The signal-to-noise 
calculations and the Hadamard transforms are tools which examine two independent 


modes by which a genetic algorithm can fail to converge to the global optimum. 
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Chapter 7 


Conclusion 


This thesis has shown how the Walsh functions and many of the techniques that use 
it can be generalized to non-binary alphabets in a natural and straightforward way. 
This conclusion focuses on possible topics for future research that use generalized 
Walsh functions. 

Directions for future research using generalized Walsh functions and transforms 
include generalizing more of the theory for binary strings, such as the nonuniform 
Walsh-schema transform (Bridges & Goldberg, 1991), creating deceptive problems 
(Liepins & Vose, 1990b; Liepins & Vose, 1991; Whitley, 1991a), operator-adjusted 
Walsh coefficients (Goldberg, 1989c), and the sufficient conditions for deception (Deb, 
Goldberg & 1992). 


Another possibility is a probabilistic approach towards deception. Checking for 
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deception requires evaluating the fitness function over every single point in the entire 
space, and for this reason, it has never been done for anything but tractable prob- 
lems or problems that were handmade to be difficult. Given the empirical evidence 
that smoother functions are easier to optimize, it seems possible that the probability 
that a function is deceptive depends on the asymptotic behavior of the generalized 
Walsh coefficients. It would be useful to have either analytical results or a table of 
probabilities that relate how quickly the coefficients decay, the length of the string, 
the cardinality, and the order of static deception. This approach might give a way 
of determining which representation is the likeliest to work for a given problem, and 
whether a GA is likely to solve the problem or not, using only knowledge about the 
smoothness of the fitness function. 

In the theory of nonlinear systems, there are two methods for finding solutions. 
The first is to make a linear approximation: the second is to use some symmetry 
of the system to reduce the dimensionality of the problem. The current theory for 
genetic algorithms is largely based on the first method; it makes predictions based on 
a linearized model of the system at generation zero. As Grefenstette (1991) pointed 
out, the validity of this linear approximation as time progresses is open to question. 
Methods of analysis based on a symmetry of the system would be valid for any length 
of time. N. Packard (personal communication, 1991) and this author have speculated 


that a renormalization group technique such as that used in analyzing lattice processes 
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in physics could also be used for analyzing GAs. The genetic algorithm, not including 
the user-defined fitness function, appears to have no fundamental length scale other 
than the string length, which corresponds to the lattice size in physics. Also, the way 
that low-order schemata combine to form higher-order schemata is reminiscent of a 
phase transition. These two facts, along with some numerical experiments, suggest 
that scaling does occur in some situations and a renormalization group approach 
would be promising. A renormalization group approach would involve treating blocks 
of characters in the string as a single character of higher cardinality, and then asking 
what would a genetic algorithm with this new representation do. This thesis, which 
provides a framework for a theory of genetic algorithms over non-binary strings, is a 


step in that direction. 
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Appendix A 


Mathematica 2.0 Programs 


A.1 Generalized Walsh Functions Over k-ary Strings 


k is the size of the alphabet 
n is the maximum length of the strings 
j is the index (in integer form) of the generalized Walsh function 


x is the argument (in integer form) of the generalized Walsh function 
Psilk_Integer,n_Integer,j_Integer,x_Integer] := 


E*(2 Pi I IntegerDigits[j,k,n].IntegerDigits[x,k,n] / k)/ 


Sqrt [k*n] 
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Example From the ternary alphabet and strings of length two example in the text 


above, we evaluate the fourth Walsh function at position two. 


In(3]:= Psil3,2,4,2] 


tao ys Pa 


A.2 Generalized Walsh Functions Over More Ar- 
bitrary Strings 


k is a list of the sizes of the alphabet used in each character 
j is the index (in vector form) of the generalized Walsh function 


x is the argument (in vector form) of the generalized Walsh function 


Psilk_List,j_List,x_List]:= 
E*(2 Pi I Plus @@ MapThread[#1 #2/#3 &, {j, x, k}])/ 


Sqrt [Times@ek] 


Example Consider the set of strings with length two. The first character is chosen 
from a binary alphabet, and the second from a ternary alphabet. Evaluate the (0,2) 


Walsh function at position (1,2). 


In (6]:=-Psi[{2,3},40;, 255 (1, 211 


(273 Pa 


Sqrt [6] 


A.3 Generalized Fast Walsh Transform Over k- 
ary Strings 


k is the size of the alphabet 


1 is the list of function values whose length is a power of k 


GFWT [k_Integer,1_List]:= 


Flatten[ 


53 


Nest[ Function[z,Table[E*-(2 Pi I j x/k),{j,0,k-1},{x,0,k-1}].z] /@ 
Partition[#,k]&,1,Round[Log[k,Length(1]]]]]/ 


Sqrt [k*Round [Log[k, Length[1]]]] 


InverseGFWT[k_Integer,1_List] := 


Flatten[ 


Nest[ Function[z,Table[E*(2 Pi I j x/k),{j,0,k-1},{x,0,k-1}].z] /@ 


Partition([#,k]&,1,Round[Log[k,Length[1]]]]]/ 


Sqrt [k*Round [Log [k, Length[1]]]] 


Example Create a list of 9 random numbers, find the transform for a ternary 


alphabet, and then untransform the data to recover the original 9 numbers. 


In[3]:= Table[Random[] ,{9}] 


Out (3]= {0.177361, 0.503785, 0.387824, 0.615398, 0.254133, 0.488114, 


> 0.00850412, 0.539387, 0.788743} 


In[4]:= GFWT([3,%] //N 
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Out[4]= {1.25442, -0.226577 + 0.106052 I, -0.226577 - 0.106052 I, 


> -0.0927234 - 0.00606561 I, -0.0247775 - 0.362999 I, 
> -0.0170897 - 0.156521 I, -0.0927234 + 0.00606561 I, 
> -0.0170897 + 0.156521 I, -0.0247775 + 0.362999 I} 


In[5]:= InverseGFWT[3,%] //N //Chop 


Out[5]= {0.177361, 0.503785, 0.387824, 0.615398, 0.254133, 0.488114, 


> 0.00850412, 0.539387, 0.788743} 


A.4 Generalized Fast Walsh Transform Over More 
Arbitrary Strings 


k is a list of the sizes of the alphabets in the string 
1 is the list of function values whose length is a the product of 


the sizes of the alphabets 
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GFWT [k_List ,1_List] := 
Flatten [Fold[ 
Function[z, Table[E*-(2 Pi I i j/#2),{1,0,#2-1},{j,0,#2-1}] . z] /@ 
Partition[#1,#2] &, 1, Reverse[k]]]/ 


Sqrt [Times@ek] 


InverseGFWT [k_List,1l_List]:= 
Flatten [Fold[ 
Function[z, Table[E*(2 Pi I i j/#2),{1,0,#2-1},{j,0,#2-13] . z] /@ 
Partition[#1,#2] &, 1, Reverse[k]]]/ 


Sqrt [Times@ek] 


Example Create a list of 6 random numbers and transform it over strings of length 
2 whose first character is chosen from a binary alphabet and whose second character 


is chosen from a ternary alphabet. 


In[3]:= Table[Random[] ,{6}] 
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Out (3]= {0.222531, 0.656238, 0.587434, 0.827656, 0.34676, 0.514834} 


In[4]:= GFWT[{2,3},%] //N 


Out[4]= {1.28821, -0.000998969 + 0.0350975 I, -0.000998969 - 0.0350975 I, 


> -0.0910589, -0.325032 - 0.0837492 I, -0.325032 + 0.0837492 I} 


In[5]:= InverseGFWT[{2,3},%] //N //Chop 


Out[5]= {0.222531, 0.656238, 0.587434, 0.827656, 0.34676, 0.514834} 


A.5 Converting Schema Averages to Walsh Co- 
efficients 


k is a list containing the sizes of the alphabets 
schema is the schema which we want to evaluate the fitness of. 
D is the don’t-care symbol 


wis the name of the generalized Walsh coefficients 


SchemaToWalsh[k_List,schema_List,w_]:= 


ov 


Sum @@ 
Prepend[ 
Delete [MapIndexed[{j [#2[[1]]],0,#1-1}&,k] , Position[schema,D]], 
w(j[#]&/@Range [Length[k]]). 
Reverse [FoldList [Times ,1,Reverse [Rest [k]]]]] 
E*(2 Pi I Plus @@ 
MapIndexed[j [#2((1]]] schema[[#2[(1]]]]/#1 &,k]) 
/. (j{#] ->0 &/@ Flatten [Position[schema,D]]) 


] / Sqgrt [Times@@k] 


Example Consider strings of length 3 whose characters are taken from alphabets 
with cardinality 2, 4, and 2. Calculate several schema averages in terms of the 


generalized Walsh coefficients. 


In[14]:= SchemaToWalsh[{2,4,2},{D,D,D},w] 


wL0] 
Out [14]= ---- 
4 
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In(1i5]: 


Out [15] 


In[1i6]: 


Out [16] 


In[17]: 


Out [17] 


In[18]: 


SchemaToWalsh[{2,4,2},{0,D,D},w] 


wLlO] + wL8] 


SchemaToWalsh[{2,4,2},{1,D,D},w] 


wLlO] - wIL8] 


SchemaToWalsh[{2,4,2},{0,0,D},w] 


wlO] + wl2] + w[4] + w[6] + w[8] + w[10] + w[12] + wf14] 


SchemaToWalsh[{2,4,2},{D,0,D},w] 
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wlO] + wl2] + wl4] + wI6] 


Out Lie |= Ase o Se ee ete 


A.6 Hadamard Transforms 


(* tensor[a,b] returns the tensor produce of a and b *) 
tensor [a_,b_]:=With[{si=Length[a] ,s2=Length[b]}, 
Table[a[[Ceiling[i/s2] ,Ceiling[j/s2]]] 


b[ [Mod [i-1,s2]+1,Mod[j-1,s2]+1]],{i,s1 s2},{j,s1 s2}]] 


(* returns a Hadamard matrix for a single character *) 


h({k_]:= Table(E*(2 Pi I i j/k),{i,0,k-1},{j,0,k-1}] 


(* returns a Hadamard matrix for a set of characters *) 


H[k_List]:= Fold[tensor [#1 ,#2]&,h[First[k]] ,h/@Rest [k] ] 


(* returns the matrix for finding deceptive conditions *) 
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M[k_List]:= With[{hmat=H [k]}, 


Table [hmat [[1]]-hmat [[i]],{i,2,Length[hmat] }]] 


A.7 Determining Deception Using Hadamard Trans- 
forms 


(* returns a list of Walsh coefficients in the partition *) 
(* specified by the schema *) 
WalshList [k_List,schema_List,w_]:= 
FlattenL 
Table @@ 
Prepend[ 
Delete [MapIndexed [{j [#2[[1]]],0,#1-1}&,k] ,Position[schema,D]], 
w[(j[#]&/@Range[Length[k]]). 
Reverse[FoldList [Times ,1,Reverse [Rest [k]]]]] 
/. (j[#] ->0 &/@ Flatten[Position[schema,D]]) 


4 


(* returns the Walsh sums needed for the deceptive conditions *) 
(* F=fixed position, D = don’t care *) 
Decep[k_List,schema_List,w_]:= 
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MCKC[#001J]J]]& /@ Position[schema,F]].WalshList[k, schema, w] 


Example Consider a function over a (3,2,2)-ary alphabet and the competition parti- 
tion (F,D,D). Decep[{3,2,2},{F,D,D},w] returns a vector whose components must 


all be positive for the function to be deceptive. 


In[3]:= Decep[{3,2,2},{F,D,D},w] 


(2° 1) 73: Pi Ca2 1/3 PL 


Out[3]= {(1 - E ) w[4] + d-E ) wl8], 


(22: 1)/3 Pi (2.1) 73) Px 


> (1 -E )wl4] + a-eE ) wl8]} 


Example Now we consider the above function over the competition partition (F,F,D). 


In[4]:= Decep[{3,2,2},{F,F,D},w] 


Out (4]= {2 wl2] + 2 w[6] + 2 w[10], 


(2-1) 3: Pa (2:7) /3- Pa 

> (1 -E ) wl4] + (A -E ) wl6] + 
(-2 1)/3 Pi (-2 1)/3 Pi 

> (1-E ) w(8] + (A -E y alto] 

(2 I)/3 Pi (2 1)/3 Pi 

> Sg (2) * C4 VE ) wl4] + (+E ) wl6] + 
(-2 1)/3 Pi (-2 1)/3 Pi 

> (1 -E ) wi8] + A+E ) wlio], 
(-2 1)/3 Pi (-2 1)/3 Pi 

> (1-E )wf4] + a@-E ) wf6] + 
(2 I)/3 Pi C2 s/s Pa 

> (1 -E ) wi8] + (Q-E ) wi], 
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(22-17 o-Pi 2 TIfS..P I 


> 2wl2] + (1 -E ) w[4] + A+eE ) wl6] + 


(QUT FS Pi (2-173 Pa 


> (1 -E pee gs) | bee oe i gee ee 2 ) wl10]} 


A.8 Determining Deception to Specified Order 


GenDecep[k_List, order_,w_]:= 
Flatten[ 
Deceplk,#,wl& /@ 
Permutations [Join[Table[F, {order}] , 


Table [D, {Length[k] -order}]]]] 


Example Consider the function used in the previous example. In order for the func- 
tion to be deceptive at the 1-st order, all the components of GenDecep[{3,2,2},1,w] 


must be positive. 


In[5]:= GenDecep[{3,2,2},1,w] 
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(2 1)/3 Pi (-2 I)/3 Pi 


GutiS]=4.C =k )wl4] +> E ) w(8], 


te2 17S. Ba (2:21) 73 {Ps 


> (1 -E earl), Gl =< E ) wit], 2 v2], 2 wit]} 


A.9 Fully Deceptive Conditions 


FullDecep[k_List,w_]:= 


Flatten [Table [GenDecepLk,q,w] ,{q,1,Length[k]-1}]] 


Example Consider again a function over strings taken from a (3,2,2)-ary alphabet. 
Then the conditions for full deception are that each of the components of the vector 


returned by FullDecep[{3,2,2},w] are positive. 


In[6]:= FullDecep[{3,2,2},w] 


(2 1)/3 Pi (-2 I)/3 Pi 
Out(6éJ= {(1 - E ) w[4]+d-E ) wl8], 
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(eZ ye Pa (2.1) 73" Pa 


(1 -E Jarl) Cle 8 y wlel, 2 wel, 2 elt), 


2wl2] + 2 w[6] + 2 wlio], 


(2 1)/3 Pi (2) 1973: Pa 


CE +E ) w[4] + (1 -E ) wl6] + 


(2 )/3° Pa (92 D)¢s- Pr 


Gols 2 ) ws]. + C1 = E ) wl10], 


C22) /S- Pa (201) 7/3 Pa 


2w[l2] + (1 - E ) w[4] + (1 + E ) wl6] + 


Coo 1/3. Bi C32 fea 


(1 -E ) w[8] + (1 + E ) wfi0], 


Go ys ey Coo DSP 4 


Ges E Dw PA CL ) w[6] + 
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(2 1)/3 Pi (201 fs Pa 


(1 -E ) ele CLS LE ) w(10], 
(-2 I1)/3 Pi (=2-1)73° Pa 
2w[l2] + (1 - E ) w[4] + (A +E ) wl6] + 
(2 I)/3 Pi (2 1)/3 Pi 
(1 -E ) w[8] + (1 + E ) wlio], 
(2 TFS Pa 
2wli] +2 wf5] + 2wf9], (1 -E ) w[4] + 
(2 I)/3 Pi (-2 I1)/3 Pi 
(1 -E ) wi5] + (1 - E ) wf[8] + 
(-2 I1)/3 Pi (2 1)/3 Pi 
(1-E ) wi9], 2wlil + (1 -E ) wf4] + 
(2 I)/3 Pi (-2 1)/3 Pi 
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(1 +E )w[5] + (1 -E ) w[8] + 


(32-1043 Fi C22 TI SPi 


Clot )wl9l, (Q-E ) w[4] + 


(=2) 1373 Pa (2 1)/3 Pi 


C1. = E£ ebb] a ) w[8] + 


C251) / SP (a2 1/3 Pa 


(1-E ) wI9], 2 wil + (1 -E ) w[4] + 


C22) LPS Pi (2 Tf SPF 


CpesE Jas] + =k ) w[8] + 


(23 1/3. Pa 


(1 + &E ) wi9], 2 wl1] + 2 w[3], 2 w[2] + 2 w[3], 


2 wi] + 2 wl2]} 
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Appendix B 


Another Basis for Generalized 
Transforms 


As in Fourier series, we can use sines and cosines as well as complex exponentials as 
our basis functions. The disadvantage of using sines and cosines as basis functions is 
that the basic theorems become somewhat more complex. However, the advantage 
is that all the Walsh coefficients of a real function are real in this basis. We will 
refer to the generalized Walsh functions and transforms using sines as cosines as real 


generalized Walsh functions and transforms. 


- ro Fo 
and sin(r) = 3—, one can convert between the gener- 





Since cos(a) = —te— 
alized Walsh coefficients used in the previous sections to the real generalized Walsh 
coefficients. 


To start, consider a function over strings of length 1. There are two cases for the 


transform itself, one for even k and one for odd k. 
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Odd k The real generalized Walsh transform is given by 


ag = Wo, 
Wy + We 
6 = Sip + for 7 £0, 
a 
b; = 31/2 (w; Wej) (B 1) 


The inverse real generalized Walsh transform is given by the following: 


FE alo 4+ 32 (a;2"cos(2mjax/k) + bj2!/?sin(2njx/k))).  (B.2) 
jal 


Of course, one can express the a,;s and b;s in terms of the inner products of the cosines 


and sines with the function f: 


oe 
dG: = ae ede) 
kip 
ae 
a, = Tp S¢ cos(2rjx/k) f(x) for 7 4 0: 
«2=0 





91/2 k— 


Ge = pia 2. sin(2aja/k) f(2). (B.3) 
«=0 
Even k 
a9 = Wo; 
apy = 27'P wipro: 
Wy + Whe j ; 
a ga for #0 and j F k/2; 
2 
bj = arpa y — Wes): (B.4) 
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k 
ae 


f(z) = alo + > (aj2'? cos(2ajxr/k) + 6;2!/sin(2zrjx/k)) 


j=l 


+ 2)? cos(rr)ay/2). (B.5) 


Again, the coefficients can be expressed in terms of the inner product of sines and 


cosines with f(x): 





= 
@ = TA Se f(x); 
Ki? & 
ae 
a) = Tip S> cos(2rjx/k) f(x) for 7 #0: 
«z=0 
91/2 k— 
b= TIP 2sin(2ngz/k) f(2). (B.6) 


The proof that the inverse real generalized Walsh transform of the real generalized 
Walsh transform of a function is the function itself comes from substituting the ex- 
pressions for a; and b; (B.1,B.4) into the expressions for f(a) (B.2,B.5) and verifying 
that it indeed is the inverse generalized Walsh transform (3.7). 

There are two ways of generalizing the above method to n dimensions. The first 
is simply to take the Fourier transform along each dimension as we did before, but to 
use sines and cosines as we have done above. The problem with using this method is 
mainly notational complexity, although the idea is just as simple as the generalized 
Walsh transforms we discussed before; all we are doing is taking the Fourier transform 
in an n-dimensional space, using sines and cosines. As this first method of generalizing 


the sine and cosine transform becomes cumbersome for long strings, it will not be 


an 


pursued any further in this thesis. 


The second method of generalizing the Walsh transform using sines and cosines 


works as follows: 


ag = [Tk DF): 
ar = 2°77 hen > cos(2a(jiti/ky + jove/ko +... Jn@n/kn)f(£) for 7 # 0; 


by = 2° TT Ke? ye cos(27( 9121 /ky + jore/ky eats Fubal Ryle): (B.7) 


Po II be > {ag + az2'! cos(27(j1.01/hy + jota/ko + ~~. Jn®n/Kn)) 
m 7 
+b;21/? sin(2a(jiai/ky + jJote/ ke Se <jatad aah (B.8) 


This method has the disadvantage that the Walsh functions in n dimensions is not just 


the product of Walsh functions in one dimension. Explicitly, they are the following: 
Definition 11 (Real f-ary Walsh Functions) 
Byes 3 
Oe) = TP an” 
OF @) = PTT k,'? cos(2 pi(jiei/ki + jota/ke +... Jjn®n/Kn)) for j #0; 


AM (a) = 2/27 T Ro’? sin(Qa(jari/hi + joro/ho +... jnan/ Rn). (B.9) 


This transform also has the useful property that function averages over schema 
with m fixed positions become sums over n — m positions in the transformed space. 


Example Consider again a function f over strings of length 2 whose characters are 
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taken from a ternary alphabet. Some schemas averages in 


coefficients are 


terms of the transform 


1 
f(**) = 340.0); 
1 
F(*0) = 3 (40.0) + 40,19): 
i 1 31/2 
f(#1) = <(a@0) = 5400.1) + > b0oa)): 
il i 31/2 
f(*«2) = 5 (10.0) - 5 M01) = > 00.1): (B.10) 


Theorem 7 Consider a schema has m fixed characters p; at positions 3;.. Then the 


average of f over that schema is 


h lb 


cos(2a(lypi/kj, + lope/k;, + 


-1/2 
I] *; 
q lin 
(0,0 ,...0,14 04-.-0,02 04.4.0 lon 0.---) 


sin(2a(lipi/ky, + lopo/ky, + 


B00 jevesaba ,OqeersQ80 ,OyeeesOglim sOqeee)* 


13 


wie dD Ra )) 


oe tei hes) 


(B.11) 
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